{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Datasets for the book\n",
    "\n",
    "Here we provide links to the datasets used in the book.\n",
    "\n",
    "Important Notes:\n",
    "\n",
    "1. Note that these datasets are provided on external servers by third parties\n",
    "2. Due to security issues with github you will have to cut and paste FTP links (they are not provided as clickable URLs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python and the Surrounding Software Ecology\n",
    "\n",
    "### Interfacing with R via rpy2\n",
    "\n",
    "* sequence.index\n",
    "Please FTP from this URL(cut and paste)\n",
    "\n",
    "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/historical_data/former_toplevel/sequence.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Next-generation Sequencing (NGS)\n",
    "\n",
    "## Working with modern sequence formats\n",
    "* SRR003265.filt.fastq.gz\n",
    "Please FTP from this URL (cut and paste)\n",
    "\n",
    "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/sequence_read/SRR003265.filt.fastq.gz\n",
    "\n",
    "## Working with BAM files\n",
    "* NA18490_20_exome.bam\n",
    "Please FTP from this URL (cut and paste)\n",
    "\n",
    "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/exome_alignment/NA18489.chrom20.ILLUMINA.bwa.YRI.exome.20121211.bam\n",
    "\n",
    "* NA18490_20_exome.bam.bai\n",
    "Please FTP from this URL (cut and paste)\n",
    "\n",
    "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/exome_alignment/NA18489.chrom20.ILLUMINA.bwa.YRI.exome.20121211.bam.bai\n",
    "\n",
    "## Analyzing data in Variant Call Format (VCF)\n",
    "\n",
    "* tabix link:\n",
    "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/supporting/vcf_with_sample_level_annotation/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5_extra_anno.20130502.genotypes.vcf.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Genomics\n",
    "\n",
    "### Working with high-quality reference genomes\n",
    "\n",
    "* [falciparum.fasta](http://plasmodb.org/common/downloads/release-9.3/Pfalciparum3D7/fasta/data/PlasmoDB-9.3_Pfalciparum3D7_Genome.fasta)\n",
    "\n",
    "### Dealing with low low-quality genome references\n",
    "\n",
    "\n",
    "* gambiae.fa.gz\n",
    "Please FTP from this URL (cut and paste)\n",
    "ftp://ftp.vectorbase.org/public_data/organism_data/agambiae/Genome/agambiae.CHROMOSOMES-PEST.AgamP3.fa.gz\n",
    "\n",
    "* [atroparvus.fa.gz](https://www.vectorbase.org/download/anopheles-atroparvus-ebroscaffoldsaatre1fagz)\n",
    "\n",
    "\n",
    "### Traversing genome annotations\n",
    "\n",
    "* [gambiae.gff3.gz](http://www.vectorbase.org/download/anopheles-gambiae-pestbasefeaturesagamp42gff3gz)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PopGen\n",
    "\n",
    "### Managing datasets with PLINK\n",
    "\n",
    "* [hapmap.map.bz2](http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/hapmap3/plink_format/draft_2/hapmap3_r2_b36_fwd.consensus.qc.poly.map.bz2)\n",
    "* [hapmap.ped.bz2](http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/hapmap3/plink_format/draft_2/hapmap3_r2_b36_fwd.consensus.qc.poly.ped.bz2)\n",
    "* [relationships.txt](http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/hapmap3/plink_format/draft_2/relationships_w_pops_121708.txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PDB\n",
    "\n",
    "### Parsing mmCIF files with Biopython\n",
    "\n",
    "* [1TUP.cif](http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=cif&compression=NO&structureId=1TUP)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python for Big genomics datasets\n",
    "\n",
    "### Setting the stage for high-performance computing\n",
    "\n",
    "These are the exact same files as _Managing datasets with PLINK_ above\n",
    "\n",
    "### Programing with lazyness\n",
    "* SRR003265_1.filt.fastq.gz Please ftp from this URL (cut and paste):\n",
    "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/sequence_read/SRR003265_1.filt.fastq.gz\n",
    "\n",
    "\n",
    "* SRR003265_2.filt.fastq.gz Please ftp from this URL (cut and paste):\n",
    "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/sequence_read/SRR003265_2.filt.fastq.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Python for Big genomics datasets\n",
    "\n",
    "### Inferring shared chromosomal segments with Germline\n",
    "\n",
    "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/shapeit2_phased_haplotypes/\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
